Improving cluster analysis by co-initializations

نویسندگان

  • He Zhang
  • Zhirong Yang
  • Erkki Oja
چکیده

Many modern clustering methods employ a non-convex objective function and use iterative optimization algorithms to find local minima. Thus initialization of the algorithms is very important. Conventionally the starting guess of the iterations is randomly chosen; however, such a simple initialization often leads to poor clusterings. Here we propose a new method to improve cluster analysis by combining a set of clustering methods. Different from other aggregation approaches, which seek for consensus partitions, the participating methods in our method are used consequently, providing initializations for each other. We present a hierarchy, from simple to comprehensive, for different levels of such co-initializations. Extensive experimental results on real-world datasets show that a higher level of initialization often leads to better clusterings. Especially, the proposed strategy is more effective for complex clustering objectives such as our recent cluster analysis method by low-rank doubly stochastic matrix decomposition (called DCD). Empirical comparison with three ensemble clustering methods that seek consensus clusters confirms the superiority of improved DCD using co-initialization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Clustering Using Evidence Accumulation

We explore the idea of evidence accumulation for combining the results of multiple clusterings. Initially, n d−dimensional data is decomposed into a large number of compact clusters; the K-means algorithm performs this decomposition, with several clusterings obtained by N random initializations of the K-means. Taking the cooccurrences of pairs of patterns in the same cluster as votes for their ...

متن کامل

Generalizing and Improving Weight Initialization

We propose a new weight initialization suited for arbitrary nonlinearities by generalizing previous weight initializations. The initialization corrects for the influence of dropout rates and an arbitrary nonlinearity’s influence on variance through simple corrective scalars. Consequently, this initialization does not require computing mini-batch statistics nor weight pre-initialization. This si...

متن کامل

Bias-Correction Fuzzy C-Regressions Algorithm

In fuzzy clustering, the fuzzy c-means (FCM) algorithm is the most commonly used clustering method. However, the FCM algorithm is usually affected by initializations. Incorporating FCM into switching regressions, called the fuzzy c-regressions (FCR), has also the same drawback as FCM, where bad initializations may cause difficulties in obtaining appropriate clustering and regression results. In...

متن کامل

On Initializations for the Minkowski Weighted K-Means

Minkowski Weighted K-Means is a variant of K-Means set in the Minkowski space, automatically computing weights for features at each cluster. As a variant of K-Means, its accuracy heavily depends on the initial centroids fed to it. In this paper we discuss our experiments comparing six initializations, random and five other initializations in the Minkowski space, in terms of their accuracy, proc...

متن کامل

Bias-correction fuzzy clustering algorithms

Keywords: Cluster analysis Fuzzy clustering Fuzzy c-means (FCM) Initialization Bias correction Probability weight a b s t r a c t Fuzzy clustering is generally an extension of hard clustering and it is based on fuzzy membership partitions. In fuzzy clustering, the fuzzy c-means (FCM) algorithm is the most commonly used clustering method. Numerous studies have presented various generalizations o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2014